Reverse Code Engineering RCE CD +sandman 2000

home *** CD-ROM | disk | FTP | other *** search

/ Reverse Code Engineering RCE CD +sandman 2000 / ReverseCodeEngineeringRceCdsandman2000.iso / RCE / Library / Manuels & Misc / Assembly / ASM-TUT.ZIP / CHAP11-1.DOC < prev next >

Wrap

Text File | 1990-06-26 | 24.6 KB | 618 lines

104 CHAPTER 11 - ADDRESSING MODES AND POINTERS In this chapter we are going to cover all possible ways of getting data to and from memory with the different addressing modes. Read this carefully, since it is likely this is the only time you will ever see ALL addressing possibilities covered. The easiest way to move data is if the data has a name and the data is one or two bytes long. Take the following data: ; ----- variable1 dw 2000 variable2 db -26 variable3 dw -589 ; ----- We can write: mov variable1, ax mov cl, variable2 mov si, variable3 and the assembler will write the appropriate machine code for moving the data. What can we do if the data is more than two bytes long? Here is some more data: ; ----- variable4 db "This is a string of ascii data." variable5 dd -291578 variable6 dw 600 dup (-11000) ; ----- Variable4 is the address of the first byte of a string of ascii data. Variable5 is a single piece of data, but it won't fit into an 8086 register since it is 4 bytes long. Variable6 is a 600 element long array, with each element having the value -11000. In order to deal with these, we need pointers. Some of you will be flummoxed at this point, while those who are used to the C language will feel right at home. A pointer is simply the address of a variable. We use one of the 8086 registers to hold the address of a variable, and then tell the 8086 that the register contains the address of the variable, not the variable itself. It "points" to a place in memory to send the data to or retrieve the data from. If this seems a little confusing, don't worry; you'll get the hang of it quickly. As I have said before, the 8086 does not have general purpose registers. Many instructions (such as LOOP, MUL, IDIV, ROL) work only with specific registers. The same is true of pointers. You may use only BX, SI, DI, and BP as pointers. The assembler will give you an error if you try using a different register as a pointer. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 11 - Addressing Modes 105 _____________________________ There are two ways to put an address in a pointer. For variable4, we could write either: lea si, variable4 or: mov si, offset variable4 Both instructions will put the offset address of variable4 in SI.{1} SI now 'points' to the first byte (the letter 'T') of variable4. If we wanted to move the third byte of that array (the letter 'i') to CL, how would we do it? First, we need to have SI point to the third byte, not the first. That's easy: add si, 2 But if we now write: mov cl, si we will generate an assembler error because the assembler will think that we want to move the data in SI (a two byte number) to CL (one byte). How do we tell the assembler that we are using SI as a pointer? By enclosing SI in square brackets: mov cl, [si] since CL is one byte, the assembler assumes you want to move one byte. If you write: mov cx, [si] then the assembler assumes that you want to move a word (two bytes). The whole thing now is: lea si, variable4 add si, 2 mov cl, [si] This puts the third byte of the string in CL. Remember, if a register is in square brackets, then it is holding the ADDRESS of a variable, and the 8086 will use the register to calculate where the data is in memory. What if we want to put 0s in all the elements of variable6? ____________________ 1 LEA stands for load effective address. Note that with LEA, we use only the name of the variable, while with: mov si, offset variable4 we need to use the word 'offset'. The exact difference between the two will be explained later. The PC Assembler Tutor 106 ______________________ Here's the code: mov bx, offset variable6 mov ax, 0 mov cx, 600 zero_loop: mov [bx], ax add bx, 2 loop zero_loop We add 2 to BX each time since each element of variable6 is a word (two bytes) long. There is another way of writing this: mov bx, offset variable6 mov cx, 600 zero_loop: mov [bx], 0 add bx, 2 loop zero_loop Unfortunately, this will generate an assembler error. Why? If the assembler sees: mov [bx], ax it knows that you want to move what is in AX to the address in BX, and AX is one word (two bytes) long so it generates the machine code for a word move. If the assembler sees: mov [bx], al it knows that you want to move what is in AL to the address in BX, and AL is one byte long, so it generates the machine code for a byte move. If the assembler sees: mov [bx], 0 it doesn't know whether you want a byte move or a word move. The 8086 assembler has implicit sizing. It is the assembler's job to look at each instruction and decide whether you want to operate on a byte or a word. Other microprocessors do things differently. On the Motorola 68000, the assembler uses explicit sizing. Each instruction must explicitly state whether it is a byte or a word.{2} On the 68000 you have: move.b #213, (A1) move.w #213, (A1) The first instruction says to move a byte (the number 213) to the address in register A1 while the second instruction says to move ____________________ 2 Any of you who use the 68000 assembler know that this is fudging the facts a little bit. Chapter 11 - Addressing Modes 107 _____________________________ a word (the number 213) to the address in register A1.{3} Back to the 8086. If the 8086 assembler looks at an instruction and it can't tell whether you want to move a byte or a word, it generates an error. When you use pointers with constants, you should explicitly state whether you want a byte or a word. The proper way to do this is to use the reserved words BYTE PTR or WORD PTR. mov [bx], BYTE PTR 213 mov [bx], WORD PTR 213 These stand for byte pointer and word pointer respectively. I find this terminology exceptionally clumsy, but that's life. Whenever you are moving a constant with a pointer, you should specify either BYTE PTR or WORD PTR. The Microsoft assembler makes some assumptions about the size of a constant. If the number is 256 or below (either positive or negative), you MUST explicitly state whether it is a byte or a word operation. If the number is 257 or above (either positive or negative), the assembler assumes that you want a word operation. Here's the previous code rewritten correctly: mov bx, offset variable6 mov cx, 600 zero_loop: mov [bx], WORD PTR 0 add bx, 2 loop zero_loop Let's add 435 to every element in the variable6 array: mov bx, offset variable6 mov cx, 600 add_loop: add [bx], WORD PTR 435 add bx, 2 loop add_loop How about multiplying every element in the array by 12? mov di, offset variable6 mov cx, 600 mov si, 12 mult_loop: mov ax, [di] imul si mov [di], ax add di, 2 loop mult_loop ____________________ 3 A1 is a 68000 register. The PC Assembler Tutor 108 ______________________ None of these examples did any error checking, so if the result was too large, the overflow was ignored. This time we used DI for a change of pace. Remember, we may use BX, SI, DI or BP, but no others. You will notice that in all these examples, we started at the beginning of the array and went step by step through the array. That's fine, and that's what we normally would do, but what if we wanted to look at individual elements? Here's a sample program: ; + + + + + START DATA BELOW THIS LINE ; poem_array db "She walks in Beauty, like the night" db "Of cloudless climes and starry skies;" db "And all that's best of dark and bright" db "Meet in the aspect ratio of 1 to 3.14159" character_count db 149 ; + + + + + END DATA ABOVE THIS LINE ; + + + + + START CODE BELOW THIS LINE mov bx, offset poem_array mov dl, character_count character_loop: sub ax, ax ; clear ax call get_unsigned_byte dec al ; character #1 = array[0] cmp al, dl ; out of range? ja character_loop ; then try again mov si, ax ; move char # to pointer register mov al, [bx+si] ; character to al call print_ascii_byte jmp character_loop ; + + + + + END CODE ABOVE THIS LINE You enter a number and the program prints the corresponding character. Before starting, we put the array address in BX and the maximum character count in DL. After getting the number from get_unsigned_byte, we decrement AL since the first character is actually poem_array[0]. The character count has been reduced by 1 to reflect this fact. It also makes 0 an illegal entry. Notice that the program checks to make sure you don't go past the end of the poem. This time we use BX to mark the beginning of the array and SI to count the number of the character. Once again, there are only specific combinations of pointers that can be used. They are: BX with either SI or DI (but not both) BP with either SI or DI (but not both) My version of the Microsoft assembler (v5.1) recognizes the forms [bx+si], [si+bx], [bx][si], [si][bx], [si]+[bx] and [bx]+[si] as the same thing and produces the same machine code for all six. Chapter 11 - Addressing Modes 109 _____________________________ We can get even more complicated, but to show that, we need structures. In databases they are called records. In C they are called structures; in any case they are the same thing - a group of different types of data in some standard order. After the group is defined, we usually make an array with the identical structure for each element of the array.{4} Let's make a structure for an address book. last_name db 15 dup (?) first_name db 15 dup (?) age db ? tel_no db 10 dup (?) In this case, all the data is bytes, but that is not necessary. It can be anything. Each separate piece of data is called a FIELD. We have the last_name field, the first_name field, the age field, and the tel_no field. Four fields in all. The structure is 41 bytes long. What if we want to have a list of 100 names in our telephone book? We can allocate memory space with the following definition: address_book db 100 dup ( 41 dup (' ')) {5} Well, that allocates room in memory, but how do we get to anything? First, we need the array itself: mov bx, offset address_book Then we need one specific entry. Let's take entry 29 (which is address_book[28]). Each entry is 41 bytes long, so: mov ax, 28 ; entry (less 1) mov cx, 41 ; entry length mul cx mov di, ax ; move to pointer That gives us the entry, but if we want to get the age, that's not the first byte of the structure, it's the 31st byte (actually address_book[28] + 30 since the first byte is at +0). We get it by writing: mov dl, [bx+di+30] This is the most complex thing we have - two pointers plus a constant. The total code is then: mov bx, offset address_book mov ax, 28 ; entry (less 1) mov cx, 41 ; entry length ____________________ 4 If you don't know about structures or records, now would be a good time to stop and go to a reference book about them. They are not actually covered here. 5 Nesting of dup statements is allowed. Rather than having uninitialized data, this has blanks in all the spaces. The PC Assembler Tutor 110 ______________________ mul cx ; entry offset from array[0] mov di, ax ; move entry offset to pointer mov dl, [bx+di+30] ; total address Though the machine code has only one constant in the code, the assembler will allow you to put a number of constants in the assembler instruction. It will add them together for you and resolve them into one number.{6} Once again, there are a limited number of registers - they are the same registers as before: BX with either SI or DI (but not both) plus constant BP with either SI or DI (but not both) plus constant We can work with structures on the machine level, but it looks like it's going to be hard to keep track of where each field is. Actually, it isn't so bad because of: OUR FRIEND, THE EQU STATEMENT The assembler allows you to do substitution. If you write: somestuff EQU 37 * 44 then every place that the assembler finds the word "somestuff", it will substitute what is on the right side of the EQU. Is that a number or text? Sometimes it's a number, sometimes it's text. Here are four statements which are defined totally in terms of numbers. This is from the assembler listing. (The assembler lists how it has evaluated the EQU statement on the left after the equal sign.) = 0023 statement1 EQU 5 * 7 = 0025 statement2 EQU statement1 + 6 - 4 = 000F statement3 EQU statement2 - 22 = 001F statement4 EQU statement3 + 16 and the assembler thinks of these as numbers (these numbers are in hex). Now in the next set, with only a minor change: = [bp + 3] statement1 EQU [bp + 3] = [bp + 3] + 6 - 4 statement2 EQU statement1 + 6 - 4 = [bp + 3] + 6 - 4 - 22 statement3 EQU statement2 - 22 ____________________ 6 And it does it quite well. The assembler correctly evaluated the following: add ax, (-3*81)+44/8+[si+27]+6+[bx]-7+(43*96)-2 Not bad, huh? Chapter 11 - Addressing Modes 111 _____________________________ = [bp + 3] + 6 - 4 - 22 + 16 statement4 EQU statement3 + 16 the assembler thinks of it as text. Obviously, the fact that it can be either may cause you some problems along the way. Consult the assembler manual for ways to avoid the problem. Now we have a tool to deal with structures. Let's look at that structure again. last_name db 15 dup (?) first_name db 15 dup (?) age db ? tel_no db 10 dup (?) We don't actually need a data definition to make the structure, we need equates: LAST_NAME EQU 0 FIRST_NAME EQU 15 AGE EQU 30 TEL_NO EQU 31 this gives us the offset from the beginning of each record. If we again define: address_book db 100 dup ( 41 dup (' ')) then to get the age field of entry 87, we write: mov bx, offset address_book mov ax, 86 ; entry (less 1) mov cx, 41 ; entry length mul cx ; entry offset from array[0] mov di, ax ; move entry offset to pointer mov dl, [bx+di+AGE] ; total address This is a lot of work for the 8086, but that is normal with complex structures. The only thing that takes a lot of time is the multiplication, but if you need it, you need it.{7} How about a two dimensional array of integers, 60 X 40 int_array dw 40 dup ( 60 dup ( 0 )) These are initialized to 0. For our purposes, we'll assume that the first number is the row number and the second number is the column number; i.e. array [6,13] is row 6, column 13. We will have 40 rows of 60 columns. For ease of calculation, the first array element is int_array [0,0]. (If it is your array, you can ____________________ 7 You will see more of the EQU statement. The PC Assembler Tutor 112 ______________________ set it up any way you want {8}). Each row is 60 words (120 bytes) long. To get to int_array [23, 45] we have: mov ax, 120 ; length of one row in bytes mov cx, 23 ; row number mul cx mov bx, ax ; row offset to bx mov si, 45 ; column offset sal si, 1 ; multiply column offset by 2 (for word size) mov dx, [bx+si] ; integer to dx Using SAL instead of MUL is about 50 times faster. Since most arrays you will be working with are either byte, word, or double word (4 bytes) arrays, you can save a lot of time. Let ELEMENT_NUMBER be the array number (starting at 0) of the desired element in a one-dimensional array. For byte arrays, no multiplication is needed. For a word: mov di, ELEMENT_NUMBER sal di,1 ; multiply by 2 and for a double word (4 bytes): mov di, ELEMENT_NUMBER sal di, 1 sal di, 1 ; multiply by 4 This means that a one-dimensional array can be accessed very quickly as long as the element length is a power of 2 - either 2, 4 or 8. Since the standard 8086 data types are all 1, 2, 4, or 8 bytes long, one dimensional arrays are fast. Others are not so fast. As a quick review before going on, these are the legal ways to address a variable on the 8086: (1) by name. mov dx, variable1 It is also possible to have name + constant. mov dx, variable1 + 27 The assembler will resolve this into a single offset number and will give the appropriate information to the linker. (2) with the single pointers BX, SI, DI and BP (which are enclosed in square brackets). mov cx, [si] ____________________ 8 Bearing in mind that all compiled languages have fixed formats for arrays. If you want your array to interact with C, Fortran, Pascal or Basic, you'd better be sure you have the right format. Chapter 11 - Addressing Modes 113 _____________________________ xor al, [bx] add [di], cx sub [bp], dh (3) with the single pointers BX, SI, DI and BP (which are enclosed in square brackets) plus a constant. mov cx, [si+421] xor al, 18+[bx] add 93+[di]-7, cx sub (54/7)+81-3+[bp]-19, dh (4) with the double pointers [bx+si], [bx+di], [bp+si], [bp+di] (which are enclosed in square brackets). mov cx, [bx][si] xor al, [di][bx] add [bp]+[di], cx sub [di+bp], dh (5) with the double pointers [bx+si], [bx+di], [bp+si], [bp+di] (which are enclosed in square brackets) plus a constant. mov cx, [bx][si+57] xor al, 45+[di+23][bx+15]-94 add [bp]+[di]-444, cx sub [6+di+bp]-5, dh These are ALL the addressing modes allowed on the 8086. As for the constants, it is the ASSEMBLER'S job to resolve all numbers in the expression into a single constant. If your expression won't resolve into a constant, it is between you and the assembler. It has nothing to do with the 8086 chip.